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Sparse representation of astronomical images is discussed. It is shown that a 
significant gain in sparsity is achieved when particular mixed dictionaries are 
used for approximating these types of images with greedy selection strategies. 
Experiments are conducted to confirm: i) Effectiveness at producing sparse 
representations, ii) Competitiveness, with respect to the time required to 
process large images. The latter is a consequence of the suitability of the 
proposed dictionaries for approximating images in partitions of small blocks. 
This feature makes it possible to apply the effective greedy selection technique 
Orthogonal Matching Pursuit, up to some block size. For blocks exceeding 
that size a refinement of the original Matching Pursuit approach is considered. 
The resulting method is termed Self Projected Matching Pursuit, because 
is shown to be effective for implementing, via Matching Pursuit itself, the 
optional back-projection intermediate steps in that approach. 



1. Introduction 

A common first step in most image processing techniques is to map the image onto a trans- 
formed space allowing for the reduction in the number of points to represent the image, up 
to some desired precision. For a significant reduction in the data dimensionality, from say N 
to K < N points, the image is said to be K-sparse in the transformed domain. In addition 
to the many apphcations that benefit from sparse representation of information [1~5], the 
emerging theory of sampling, called compressive sensing/ sampling, asserts that sparsity of 
a representation may also lead to more economical data collection [6-10]. The relevance of 
compressive sensing within the context of astronomical data is discussed in [11,12], where 
algorithms for signal recovery are advanced and illustrated by these types of data. The usual 
compressive sensing framework assumes that the signal is sparse in an orthogonal basis or 
incoherent dictionary, because most of the recovery proofs have been achieved under those 



conditions. However, recent theoretical results expand the analysis to coherent dictionar- 
ies [13, 14], because it is often the case that an approximation is sparser when elements from 
such a dictionary are used in the decomposition. Alternatively, as shown in [15, 16], high 
sparsity enables exploitation of the redundancy in the pixel intensity representation of an 
image, to reduce the image size when encrypted. The success of this technique, termed En- 
crypted Image Folding (EIF), strongly depends on the sparsity of the image representation. 
The sparser the representation is the smaller the size of the folded image. 

In this Communication we wish to highlight the significant gain in sparsity that may be 
obtained by releasing the condition of incoherence when designing a dictionary for sparse 
representation of astronomical images. The problem we address is described as follows: 

Given an astronomical image, find its sparse decomposition as a superposition of elemen- 
tary components, selected from a large redundant set called a dictionary. 

It is clear that the success of producing a very sparse representation of an image depends 
in a large part on the ability to construct appropriate dictionaries from which to select the 
right elements, frequently called 'atoms'. Here a mixed dictionary is considered, which will 
be shown to be suitable for achieving sparse representations of astronomical images. A useful 
dictionary for this purpose should be capable of sparsely representing two different features; 
i)fairly smooth regions (nebulae) of intergalactic media, gases, dust, etc., and ii)bright spots 
(stars). In order to account for smooth regions we use a Redundant Discrete Cosine (RDC) 
dictionary. The model of bright spots and edges is accomplished by the union of B-sphne 
dictionaries of different order and support. The combination of these two types of dictionaries 
provides us with a mixed dictionary yielding a very significant gain in the sparsity of an astro- 
nomical image, in relation to the outcomes from the most commomly used transformations 
in image processing; the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform 
(DWT). Their convenient distinctive feature is that they are suitable for processing large 
images by segmenting them into small blocks. The advantage of this property is twofold: a) It 
entails storage requirements which are affordable for processing by effective pursuit strate- 
gies. b)The sequential processing of blocks is fast enough to be practical and there is also 
room for the possibility of straightforward parallel processing when those resources become 
widely available. 

The numerical experiments for illustrating the approach involve large images from the 
European Southern Observatory (ESO) [17] and a set of fifty five images captured by the 
Hubble Space Telescope (HST) [18]. Considerations are restricted to approximations of high 
quality (PSNR of 45 dB or higher). While the sparsity level strongly depends on each par- 
ticular image, in all the cases is massively higher than the sparsity yielded by the DCT and 
DWT. Since the computational time is very competitive, we confidently conclude that the 
mixed dictionaries under consideration are suitable for achieving highly sparse representation 
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of astronomical images. 

The paper is organized as follows: Sec. 2 discusses highly nonlinear approximation tech- 
niques and introduces the discrete B-spline based dictionaries which, together with the RDC 
dictionary, form the highly coherent mixed dictionary that provides the basic elements for 
representing an image. Matching Pursuit like selection techniques are also discussed in this 
section. In particular, the proposed Self Projected Matching Pursuit strategy is established 
as a possible alternative to Orthogonal Matching Pursuit, when the latter cannot be im- 
plemented due to storage requirements. Sec. 3 illustrates the capability and effectiveness of 
the approach to yield fast sparse representation of astronomical images. The conclusions are 
presented in Sec. 4. 

2. Sparse representation by highly nonlinear approximation techniques 

We start by introducing some basic notation: M and N represent the sets of real and natural 
numbers, respectively. Boldface letters are used to indicate Euclidean vectors or matrices, 
whilst standard mathematical fonts indicate components, e.g., d G is a vector of com- 
ponents d{i), i = 1, . . . ,N and I e M^^^^f a matrix of elements i = 1, . . . , N^, j = 

Let V = {d„ G K^j^li be a spanning set for an inner product space Vjv of finite dimension 
N and f G a signal to be approximated by an element G = spanjd^.}^]^, i.e., 

K 

= ^ c(i)d^., where K < N. (1) 

i=l 

When N = M and the spanning set V is linearly independent it is a basis for Vat, otherwise 
it is a redundant frame [19]. In order to advance in the discussion as to how to select from 
V the i^-elements d^., i = 1, . . . , in (1), we need to discriminate two different situations: 

\) M — N and {d^}^! is an orthogonal basis for Vat. 

ii) < and {dj}j^^ is a non orthogonal and not necessarily linearly independent 
spanning set for Vat. 

Case i) leaves rooms for the linear and nonlinear forms of selecting the elements d^., i = 
1, . . . , Both types of approximation are easily realized in practice. A linear procedure 
determines before hand a fixed order for the elements of V and uses, say the first K elements, 
for the approximation. On the contrary, a nonlinear procedure would make the selection 
dependent on the signal to be approximated. For example: it is well known that in order 
to construct the approximation of f , such that ||f^ — f || is minimum (where || • || is the 
square norm induced by the inner product) one should select the elements de^, i — 1, . . . , K 
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corresponding to the K coefficients c{i) — (d^, f), i = 1, . . . , X of largest absolute value. This 
approximation is nonlinear, but the implementation in finite dimension is straightforward. 

On the contrary, case ii) introduces an intractable problem. If K is fixed, the choice of 
the K elements d^., i — 1,...,K minimizing ||f^ — f|| involves a combinatorial problem. 
Moreover, the alternative situation; the one of finding the minimum value of K such that 
||f''^ — f II < p, for a given tolerance p, is also intractable. This is the reason why this type of 
approximation is said to be highly non linear, and in practice is addressed in some suboptimal 
manner. Rather than looking for the sparsest solution (minimum value of K) one looks for 
a 'sparse enough solution'. This means that the number of i^-tcrms in (1) is 'small enough' 
for the representation to be convenient in the particular context. 

Usually highly non linear approximations of a signal f using a dictionary V — {di}^i are 
realized by: 

a) Expressing as X^i^i c(i)di and finding nonzero coefficients by minimization of 
the 1-norm ||c||i = X^^i |c(i)| [20]. 

b) Using a greedy pursuit strategy for stepwise selection of the K normalized to unity 
elements d^. i = 1,...,K, called atoms, for producing the approximation — 
X^^ic(i)d4, which is termed atomic decomposition. 

We restrict considerations to greedy pursuit algorithms because, for the highly coherent dic- 
tionaries we are considering, are more effective and faster than those based on minimization 
of the 1-norm. 

2. A. Matching Pursuit based selection techniques 

The greedy selection strategy Matching Pursuit (MP) was introduced with this name in 
the context of signal processing by S. Mallat and Z. Zhang [21]. Previously it had appeared 
as a regression technique in the statistical literature, where the convergence property was 
established [22] . The implementation is very simple and evolves by successive approximations 
as follows. 

Let R*^ be the A;-th order residue, R^ = f — f^, and Ik the index for which the corresponding 
dictionary atom d^^. yields a maximal value of | (dj, R^) |, i = 1, . . . M. Starting with an initial 
approximation f ^ = and R^ = f the algorithm iterates by sub-decomposing the k-th order 
residue into 

R'= = (d„,R')d, + R''+\ n = l....,M, (2) 

which defines the residue at order A; -|- 1. Since R^"*"^ given in (2) is orthogonal to all d„, n — 
1, . . . , M, it is true that 

||R'=||2 = |(dn,R'=)|2 + ||R'=+i^ n=l,...,M. (3) 
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Hence, the dictionary atom d^^ yielding a maximal value of [(R'^jd^)! minimizes ||R'^+-'^|p. 

Prom (2) it follows that at each iteration k the MP algorithm results in an intermediate 
representation of the form: 

f = f + R*^+^ (4) 

with 

k 

f'= = J](d,„,R")d,„. (5) 

n=l 

It was first proved in [22] that in the limit A; — >■ oo the sequence f*^ converges to -Pvjv^f, 
the orthogonal projection of f onto Vm — span{d„}^^ (the proof is translated to the MP 
context in [21]). Nevertheless, if the algorithm is stopped at the /cth-iteration, f*^ recovers an 
approximation of f with an error equal to the norm of the residual R'^'^^ which, if the selected 
atoms are not orthogonal, will not be orthogonal to the subspace they span. An additional 
drawback of the MP approach is that the selected atoms may not be linearly independent. 

A refinement to MP, which does yield an orthogonal projection approximation at each 
step, has been termed Orthogonal Matching Pursuit (OMP) [23]. In addition to selecting only 
linearly independent atoms, the OMP approach improves upon MP numerical convergence 
rate and therefore amounts to be, usually, a better approximation of a signal after a finite 
number of iterations. OMP provides a decomposition of the signal as given by: 

k 
n=l 

where the coefficients c^(n) are computed in such a way that it is true that 

k 

J2c'{n)de^^pYj, with = span{d^J^=i. 

n=l 

Thus, OMP yields the unique element f'^ G V/,. minimizing [jf'^' — f [|. The superscript of c*^(n) 
in (6) indicates the dependence of these quantities on the iteration step k. 

The OMP approach is effective for processing signals up to some dimensionality. It may 
become prohibitive, because of its storage requirements, when the signal dimension exceed 
some value. In this respect, MP has the advantage of being suitable for processing very 
large dimensional signals and, for 2D images, it fully exploits the separability property of 
dictionaries. Since our mixed dictionaries are adequate for block processing, in general the 
OMP approach is an appropriate technique. However, one of the aims of the present effort is 
to study, in a standard personal computer, the dependence of the sparsity of a representation 
with respect to the block size of an image partition. For this purpose, we are forced to 
overcome storage requirements of the standard OMP implementations. The goal is achieved 
by applying the refinement to the MP method proposed in the next section. 
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2.B. Self Projected Matching Pursuit 

The seminal paper [21] discusses a possible improvement of the MP approximation by means 
of back-projection steps, which stands for computing the orthogonal projection of the MP 
approximation. The authors suggest this could be done by the conjugate gradient method. 
Unfortunatelly the processing time of that method is not affordable in practice for large 
dimensional problems, and specially with highly correlated dictionaries. Thus, the question 
we have tried to answer is: 

Since the MP approach converges asymptotically to the orthogonal projection onto the span 
of the selected atoms, would it be affective to use MP itself to compute the back-projection 
steps? 

Of course there is an increment of step wise complexity but, as the example presented here 
illustrates, on the whole the refinement may perform better and faster. 

The resulting method, that we have termed Self Projected Matching Pursuit (SPMP) 
evolves as follows. Given a dictionary V = {dn}n=i and a signal f , set S = {0} and R = f . 
Assuming that the required projection step is of length p, implement the algorithm below. 

i) Apply MP up to step p selecting atoms from dictionary T> = {dn}n=i- Assuming that 
the distinct selected atoms are {d^„}^=i assign S -i^ S U {d^„}^=i- Set K equal to the 
cardinality of the updated S. Let us denote as the approximation of f so far and as 

the residue R^ = f - f^. 

ii) Approximate R^ using only the selected set S as the dictionary, which guarantees the 
asymptotic convergence to the approximation PsR"^ of R-^, where S = span^*, and a 
residue R-*- — — PgR^ having no component in S. 

iii) Set R ■<— R-*- and f ^ •<— f + PgR and repeat steps i) and ii) until, for a required p, 
the condition ||R|| < p is reached. 

For p = 1 the above refinement gives, asymptotically, the orthogonal projection approxima- 
tion at each iteration, thereby reproducing the results of OMP. As illustrated by the example 
below, significant improvement upon the original MP approach may be achieved for values 
of p larger than one. 

Example. This numerical example is a hard test for MP. Consider the Redundant Discrete 
Cosine (RDC) dictionary T>i given by: 

V^ = {v,; v,{j) = cos( ^^^^" ~ ~ ), j = 1, . . . , N}^^„ (7) 

with Wi, i = 1, . . . , M normalization factors. For M = N this set is a Discrete Cosine (DC) 
orthonormal basis for the Euclidean space M^. For M = 2zN, with 2; G N, the set is a 
RDC dictionary with redundancy 2z, which will be fixed equal to 2. To represent the chirp 
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Fig. 1. Chirp signal approximated up to error p = 0.001 ||f|| hji) K = 683 orthogonal 
DC components taken from (7) with N = M = 2000. ii) K = 286 atoms taken from 
(7) with M = 2N = 4000 using OMP or K = 1638 using MP . iii) K = 300 atoms 
taken from (7) with M = 2N = 4000 using SPMP with p = 10 oi K = 286 with 
p = 3. 

signal cos(27rt^) depicted in Fig. 1 we take an equidistant partition of the interval [0, 8] 
consisting oi N — 2000 points and sample the chirp at those points /(i), i — 1, . . . , N. The 
aim is to find an approximation of these points up to precision p — 0.001||f||. Considering 
M — N — 2000 in the above definition of Vi we have an orthonormal basis and therefore 
both MP and OMP methods give the sparsest decomposition of the signal in orthogonal 
DC components. For an approximation to the given precision (coinciding visually with the 
theoretical chirp in Fig. 1) it is necessary to use K — 683 orthogonal elements from (7). Now, 
setting M — 2N — 4000 the dictionary T>i is no longer an orthogonal basis but a redundant 
tight frame [19] and the algorithms MP and OMP produce very different decompositions. 
While OMP improves the sparsity of the representation requiring only K = 286 components, 
MP needs K = 1638 different atoms, i.e. significantly more than with the orthonormal basis. 
The reason for the poor performance of MP is that in the redundant dictionary the atoms 
are highly correlated and the method is picking linearly dependent atoms, something that 
cannot occur with OMP. However, when applying the proposed refinement SPMP with 
projection step p = 10 the number of required components drops to K = 300. For p = 3 
the number of required components is that of OMP, i.e. K = 286. While in this example 
there is no need for the SPMP approach, because the already established algorithm OMP 
performs the decomposition faster, the result illustrates the fact that SPMP can provide an 
effective alternative to OMP when, as is the case with 2D images, OMP becomes slow or its 
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storage demands cannot be met. Further details for the 2D implementation of SPMP will 
be discussed in Sec. 2.C. Before that, we shall introduce the proposed mixed dictionaries for 
representing astronomical images. 

2.B.I. Building mixed dictionaries for sparse representation of astronomical images 

Assume that the iC-sparse representation of a given image I e M^^^^^', N^^Ny e N, is 
represented as 

I^ = ^c^(i)D,,, (8) 

1=1 

where the elements D^. e M'^'^^-^'', i — 1, . . . , X in (8), are to be selected from a dictionary 
"D = {-Dj}^^ which is obtained as the Kronecker product T> — T>^ "D^ of the dictionaries 
= {d^^ G and = {d^ e R^yj^l^. In this section we discuss a particular 

dictionary V, which will be shown to be adequate for sparse representation of astronomical 
images. 

Redundant Discrete Cosine (RDC) Dictionary 

As already mentioned in order to sparsely represent the fairly smooth regions of the images 
being considered, one of the components of the proposed mixed dictionary is chosen to be 
a RDC dictionary X'l introduced in the last section, fixing M = 2N so as to have a RDC 
dictionary with redundancy two. 

Redundant Discrete B-Spline (RDBS) based dictionaries 

The other component of the proposed mixed dictionary, which allows for the representation 
of bright spots and edges, is inspired by a general result holding for continuous spline spaces. 
Namely, that spline spaces on a compact interval can he spanned by dictionaries of B- splines 
of broader support than the corresponding B-spline basis functions [26, 27]. This may result 
in a very considerable gain in sparsity for functions well approximated in spline spaces. Here 
we consider equally spaced knots so that the corresponding B-splines are called cardinal. All 
the cardinal B-splines of order m can be obtained from one cardinal B-spline B{x) associated 
with the uniform simple knot sequence A = 0,l,...,m. Such a function is given as [28] 

B^{x)^-Y,{-irrYx-i)r\ (9) 

i=0 ^ ^ 

where {x — is equal to {x — i)"^"^ if x — i > and otherwise. We shall consider 

only B-Splines of order m — 1 and m = 4 and include associated derivatives. For m — 2 
the corresponding space is the space of piece wise linear functions and can be spanned by a 
linear B-spline basis, or dictionaries of broader support, arising by translating a prototype 



8 



'hat' function. Equivalently, the cubic sphne space corresponding to m = 4 is spanned by 
the usual cubic B-sphne basis, or dictionaries of cubic B-sphne functions of broader support. 
Details on how to build B-sphne dictionaries are given in [26,27]. The numerical construction 
of the cases m — 2 and m = 4 considered here is very simple and arises by translations of 
the prototype functions given below: 



y if 0<a;</ (10a) 

Bi{x) = ^2-y if l<x<2l (10b) 

otherwise. (10c) 
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The B-spline basis for the cardinal spline space corresponding to m = 2 is constructed by 
considering Z = 1 in (10) and translating the prototype every knot. Dictionaries for the 
identical space of functions of broader support arise by setting i G N in order to fix the 
desired support. The B-spline basis for the cubic cardinal spline space, corresponding to 
m = 4, requires to set / = 1 in (11) and translate the concomitant prototype. Dictionaries 
are obtained by taking larger values of /. 

As discussed below, derivatives of the above functions also provide suitable prototypes 
to achieve higher levels of sparsity in the representation of a signal. Now, for constructing 
dictionaries for digital image processing we need to 

a) Discretize the functions to obtain adequate Euclidean vectors. 

b) Restrict the functions to intervals which allows images to be approximated in small 
blocks. 

We carry out the discretization by taking the value of a prototype function only at the knots 
(c.f. small circles in graphs Fig. 2) and translating the prototype one sampling point at each 
translation step. At the boundaries we apply the 'cut off' approach and keep all the vectors 
whose support has nonzero intersection with the interval being considered. 
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Remcirk 1. It is worth mentioning that by the proposed discretization the hat B-spline basis 
for the corresponding interval becomes the standard Euclidean basis. By discretizing the hats 
of broader support the samples preserve the hat shape. 

Obviously for a finite dimension Euclidean space one can construct arbitrary dictionaries. 
In particular, redundant B-spline based dictionaries with prototypes of different support and 
shapes arising from the functions (10) and (11) and their corresponding derivatives. 

Indicating as d^S^(x) the derivative of B'^{x) and as d^S^(a;) its second derivative, 
in addition to linear and cubic B-splines we shall consider the additional prototypes 
d^B2{x), d^B{{x) and d'^Bl{x). Vectors of different support may be included by merging those 
dictionaries. For our experiments we construct the Redundant Discreet B-Sphne (RDBS) 
based dictionaries as follows: 

V,^{b,Y:^{j-i)\N;j^l,...,L}fi\, m = 2,4, s = 2,...,9, 

where the notation Ym{j — i)\N indicates the restriction to be an array of size and hi, i = 
1, . . . ,Mg are normalization constants. The arrays Y2, s = 2,3,4, Y^, shown consecutively 
in the left graph of Fig. 2, and Y2, s = 5,6, F/,s = 8,9 shown consecutively in the right 



graph of the same figure, are defined as follows: 

i Bl,l = 1, 2, 3 for s=2,3,4 (respectively) (12a) 

^ [d^B^, 1^2,3 for s=5, 6 (respectively). (12b) 

{Bj for s=7 (13a) 

d^Bl for s=8 (13b) 

d'^Bl for s=9. (13c) 



The cut off approach applied to the boundaries implies that the numbers Mg of total atoms 
in the sth-dictionary varies according to the atom's support. 

Taking N = N^, an unidimensional mixed dictionary, T)^, results by joining dictionary 
Vi (c.f. (7)) and the above defined RDBS ones, i.e. V = D^^Vs,. Taking iV = A^y an 
equivalent mixed dictionary, V^, is obtained. The mixed dictionary V for K^^^^y is the 
Kronecker product T? = T)^ ® V^. However, as discussed below, this 2D dictionary does not 
need to be constructed. This advantage represents a huge save in memory requirements. 

2. C. 2D implementation of the selection strategies with separable dictionaries 

Given an image I G M^-^^f and two ID dictionaries 1?^ = {d^ G and = {d?^, G 

^^"Im^i the greedy procedure 0MP2D for approximating I with atoms taken from T>^ and 
Vy iterates as follows. 
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Fig. 2. Prototype atoms as defined in (12) and (13). The RDBS component of 
the dictionary is constructed by translation or these atoms, applying the cut off 
approach at the boundaries. 



On setting R° = I at iteration k + 1 the algorithm selects the atoms dL G P^' and d^^ 
that maximize the absolute value of the Frobenius inner products (d^, R'^d^)F, n 
1, . . . ,M^, m = 1, . . .,My, i.e., 



tk+iJ\+, = argmax | V <(i)i?'=(z, j)C(j)l 

n=l,...,Mx ~l 
m=l,...,My 

With (14) 

k 

R'{z,j) = -Y,c\n)dUi)d'Aj)- 

n=l 

The coefficients c'^(n), n = 1, . . . , k in the above expansion are such that ||R'^||f is minimum, 
where || ■ ||f is the Frobenius norm. This is ensured by requesting that R'^ = I — -Pvfel, where 
Pvfc is the orthogonal projection operator onto = spaii{d^^ ® d^yj^^^ A straightforward 
generalization of the implementation discussed in [24, 25] for the ID case provides us with 
the representation of as given by 

k k 

Pv,I = Yl A-(B- = E c'H An, (15) 

n=l 71=1 

where each A„ G M^^^^f is an array with the selected atoms A„ = dfx ® d^y and B^, n = 
1, . . . ,k are the concomitant reciprocal matrices, which are the unique elements of M^^^^y 
satisfying the conditions: 



i) (A„,B^)f = 5n,, 



1 lin = m 
if n 7^ m. 
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ii) V, = span{B^}^ 



Such matrices can be adaptively constructed through the recursion formula: 




where 



(16) 




Cfe+i/||Cfc+i|||, with C 



1 — Ai and C^+i — A^+i — ^ " (^n, Afe+i)F 

n=l " 



For numerical accuracy in the construction of the set C„, n — 1,...,A; + 1 at least one 
re-orthogonalization step is usually needed. It implies that one needs to recalculate these 
matrices as 



The coefficients in (15) are obtained from the inner products c^{n) = (B^, I)f, n = 1, . . . ,k. 
The algorithm iterates up to step, say K, for which, for a given p, the stopping criterion 
||I -I^IIf < p is met. The MATLAB function 0MP2D, and corresponding MEX file in C++ 
for faster implementation of the identical function, are available at [29]. 

Up to some block-size 0MP2D is very effective. It takes advantage of the separability 
property of the dictionary, except for the construction of the required matrices B^, n = 
1, . . . ,k. Unfortunately, for blocks larger than a certain size the concomitant storage demands 
are not available on a standard personal computer, or the computations became very slow 
and the above implementation of 0MP2D is no longer affective. As already mentioned, in 
order to avoid the storage and computation of matrices B^, n = 1, . . . ,k, we propose the 
SPMP method. Algorithms 1, 2, and 3 outline its implementation in 2D, which we term 
SPMP2D. Putting aside the complexity for the selection process, which is the same for 
both approaches, the complexity order for the procedure of including one more term in the 
approximation is 

• 0{kN^Ny) for 0MP2D 

• 0{kK,NxNy) for SPMP2D, where k, indicates the number of iterations to improve the 
MP2D approximation by self projections. 

The number k, is expected to depend on the correlation of the selected atoms. For the 
dictionaries and the class of images we are considering we can assert that /t is a small 
number. When this relation is fulfilled the complexity of both approaches are of equivalent 
order. However, as illustrated in the next section, the storage requirements of 0MP2D slow 
the processing significantly when the size of the blocks partitioning the images increases 
beyond some value. In such situations SPMP2D, which does not require the calculation or 




(17) 
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Algorithm 1 Implementation of the proposed SPMP2D method to approximate an image. 



Input: Image I e R^-x^«. Dictionaries = {d^ e ^^''}n=i and = {d^ e ^^'}n=i- 
Approximation error p > and tolerance e > for the numerical realization of the 
projection. Length of projection step p. 

Output: Approximated image lApprox £ R^^'^^y. Coefficients in the atomic decomposition 
c e R'^. Ordered pair of indices labehng the selected atoms F = {{'^n^ '^n)}n=i- 
{ Initialization} 

Set r = {0}, lApprox = 0, R = I k ^ 0, Errori = 2p 
SctC(2,j) = 0,1 = l,...,M,,j = l,...,My. 
{ Begin the algorithm} 
while Erroii > p do 

Apply Algorithm 2 {p-MP-iterations} to obtain: 

r = {(C,C)}Li,iApprox e M^^x^^ r = i - Upprox e M^^^iv- ^ e m^^^^m, 

{ Collect nonzero coefficients in c € M^} 
for n = 1 : do 

c{n) = c{e^,,eyj 

end for 

Apply Algorithm 3 { Improve approximation by orthogonal projection via MP } to 
update c G M''' so that lApprox ^ lApprox + A^^R-) R R — A/J.R {where = 
span{d?. ®d^,}^=J 
for n = 1 : A; do 

C{i^,iD = c(n) {Update of matrix with coefficients} 
end for 

Errori •<— ||I — lApprox ||f 

end while 



storage of Kronecker products, becomes a suitable alternative for orthogonalization of the 
MP approach. 

3. Numerical Experiments and Results 

The viability of using the mixed dictionary described in Section 2.B.1 to quickly approximate 
an image by either 0MP2D or SPMP2D follows from its suitability for block processing. This 
implies to divide the image I into small blocks, for independent approximation. 

Without loss of generality blocks are assumed to be square of size x pixels. Also 
for simplicity an image I will be assumed to be the composition of H identical blocks, i.e., 

I = Uplift, 
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Algorithm 2 p-plain MP iterations for atoms selection and collection of contributions of 

repeated atoms 
for t — 1 : p do 

for n = 1 : and m = 1 : My do 

Giri,m) ^^^f^ dUt)R{t,j)dl,{j) 

end for 

q^, — arg max \G{n, m)\ {MP selection of atoms} 

n,m=l,...,M 

{Store coefficients as a matrix adding contributions from repeated atoms} 
C{q^,qy)^C{q^,qy) + G{q^,qy) 
{Update of approximation and residual} 
for i — 1 : Nx and j = 1 : A^j^ do 

A^C{q^,qy)d^^.{t)dly{j) 

R{i,j) = R{t,j)-A 

I Approx{i, j) — -^Approx(^) j) + A 

end for 

if (g^, qy) i r then 

k^k^l, [II, II) ^ {q\ qy), r ^ r U {£%, ei) {Update set of indices} 
end if 
end for 

Algorithm 3 Orthogonal projection onto Vfe = span{d|^ (g) d^j; }^^^ via MP 
Set Errors = 2e 
while Errors > e do 
for n = 1 : A; do 

9(n)^Eif''d^eSR(i,j)dyy(j) 

end for 

q — arg max \g{n)\ 

n=l,...,k 

c{q) — c{q) + g{q) {Update coefficients} 
{Update approximation and residual} 
for i — 1 : Nx and j = 1 : A^^ do 
R(i, j) = R{hj) - df.^{i)dy^y{j)g{q) 
lApprox(«, j) = lApprox(«, j) + d^i.{:L)d\y{j)g{q) 
end for 
Errors ^ \g{q)\ 
end while 
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where every 1^ is an intensity array of size x N^, to be approximated as 

n=l 

In what follows the performance of our dictionary based approach is illustrated by recourse 
to numerical experiments. The sparsity is measured by the Sparsity Ratio (SR) defined as 

number of pixels HN"^ 
number of coeffients Y^^-i 

General setup 

The experiments have been realized in the MATLAB programming environment, on a 
laptop with a 2.4GHz Intel Core 2 Duo P8600 processor and 7.7GB of RAM. 

In all the cases the approximation tolerance is fixed to produce a sharp PSNR of 45dB for 
the complete image. For such a PSNR Mean Structure Similarity index (MSSIM) [30] with 
respect to the original image is very close to one (MSSIM > 0.98 for all the images). 

Unless explicitly specified the mixed dictionary is the one introduced in section Sec.2.B.l, 
i.e. the union of a RDC dictionary, redundancy two, and the RDBS dictionary arising by 
translation of the eight prototypes shown in Fig. 2. 

The comparison with the DCT and DWT refers to the nonlinear approach achieving, 
by thresholding of the DCT and DWT coefficients respectively, the required PSNR of 45 
dB. The DWT is apphed to the whole image using software implementing the Cohen- 
Daubechies-Feauveau 9/7 wavelet transform. 

Experiment I 

The aim is to evaluate the sparsity of an image representation, by the proposed mixed 
dictionary, against the size of the blocks partitioning the image. The results are compared 
with those yielded by the 2D version of the DCT and DWT orthogonal transforms. With this 
end in mind, numerical experiments were conducted using a set of images downloaded from 
the ESO website [17], converted to gray intensity levels, for two resolutions (publication and 
screen size). We include here full results corresponding to the two images in Fig. 3, which 
arc good representatives of the range of images in the data set tested. 

For a fixed PSNR of 45dB the SR is calculated by partitioning the corresponding image 
into square blocks of side length 8,16,24,32,40 and 48. Fig. 4 depicts the SR obtained, 
using the mixed dictionaries from Sec. 2.B.1, and 0MP2D, SPMP2D with projection step 
one (SPMP2Di) and ten (SPMP2Dio), and the 2D version of MP, for separable dictionaries, 
that we denote MP2D. Sparsity is also compared against results for the DCT (for the same 
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Fig. 3. The first image is the Nebula Orion (Messier 42 Ref. esolOOG) and the 
second the Spiral Galaxy NGC 4945 Ref. eso0931). Both images are taken from 
ESO [17] at publication and screen resolutions. The corresponding sizes (in pixels) 
are: 4000 x 3252 and 1280 x 1574 (nebula) 4000 x 4000 and 1280 x 1280 (galaxy). 

block size) and the DWT applied to the whole image. The points joined with different 
lines in the top left graph of Fig. 4 show results for the first image of Fig. 3 at the higher 
resolution (4000 x 3552 pixels). The results corresponding to the second image of Fig. 3, also 
at the higher resolution (4000 x 4000 pixels), are shown in the top right graph of Fig. 4. The 
bottom left and right graphs depict the same information as the top graphs but correspond 
to the screen size resolution of the images (size 1280 x 1574 pixels and size 1280 x 1280 
pixels respectively). 

Discussion of results 

Let us start by highlighting the fact that the results of Fig. 4 illustrate a massive gain in 
sparsity yielded by the dictionary approach, in comparison to the DCT and DWT. 

A clear feature in the results corresponding to the higher resolution images (top graphs 
of Fig. 4) is that, as opposed to the results for the DCT (decreasing curve in all the graphs 
of Fig. 4), the SR yielded by the dictionary approach increases with the block size, rapidly 
up to some value. For most images in the data set, and in particular for the two images of 
Fig. 3, block size 16 x 16 yields the best trade off between sparsity and processing time (c.f. 
Tables 1). 

The two bottom graphs confirm that, as should be expected, for the fixed PSNR of 45dB 
sparsity decreases with respect to the previous resolution and the block size has less relevance. 
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8 16 24 32 40 48 8 16 24 32 40 48 



Fig. 4. SR vs partition of side length 8, 16,24,32,40, and 48 yielded by 0MP2D, 
SPMP2D with projection step one (SPMP2Di) and ten (SPMP2Dio), MP2D, and 
the DCT. The constant dotted line corresponds to the DWT result and is plotted 
only for visual comparison, since the DWT is applied to the whole image. The left 
and right graphs correspond to the nebula and galaxy of Fig. 3, respectively. The top 
graphs correspond to the higher resolution 4000 x 3252 pixels and 4000 x 4000 pixels 
respectively. The bottom graphs correspond to the lower resolution 1280 x 1574 and 
1280 X 1280 pixel images respectively. 
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Notice that for the lower resolution image of the nebula, the SR shown in the bottom 
left graph of Fig. 4 becomes almost uniform for block sizes larger than 16 x 16 pixels. For 
the lower resolution image of galaxy the SR shown in the bottom right graph of Fig. 4 
increases with the block size, but much less than for the same image at higher resolution 
(top right graph). Table 1 compares the time (average of 5 independent runs) spent by 
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209 
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29.97 
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29.88 


392 


29.30 


382 


26.46 


387 
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30.36 


1065 


30.24 
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29.80 


640 


26.90 


660 


48 X 48 


30.78 


2041 


30.60 


1055 


30.25 


1015 


27.20 


1032 
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79 


16.85 


85 


16.49 


89 


15.95 


79 


16 X 16 


20.51 


163 


20.50 


185 


19.84 


154 


18.73 


147 


24 X 24 


21.27 


413 


21.23 


362 


20.61 


354 


19.12 


328 


32 X 32 


21.59 


916 


21.52 


694 


20.93 


666 


19.16 


653 


40 X 40 


21.70 


1989 


21.59 


1494 


21.02 


1145 


19.13 


1139 


48 X 48 


21.68 


4031 


21.53 


2477 


21.04 


1919 


19.11 


1853 



Table 1. SR and execution time, in sees, for approximating a complete image with 
different approaches and different sized blocks partitioning the image. The top half 

of the table corresponds to the results for the nebula at publication size resolution 
(4000 X 3252) pixels. The bottom half contains the results for the galaxy image at 
the equivalent resolution (4000 x 4000 pixels). 

the methods considered here, using the proposed dictionary, vs the size of the blocks 
partitioning the image. As can be observed, 0MP2D implemented as described in Sec. 2.C 
is slightly faster than SPMP2D with projection step one, up to block size 24 x 24, and 
becomes noticeable slower beyond that block size. This behavior is not a consequence of 
mathematical complexity, which as discussed in Sec 2.C in the best scenario are at most of 
the same order, but as a consequence of storage requirements. As the block size increases, 
the poor execution time scaling for 0MP2D is a result of the additional memory required, 
over SPMP2D, for the storage of matrices B and C (c.f. (16)). Another interesting feature 
is that the results of SPMP2D with projection step larger than one do not improve the 
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processing time significantly. 
Experiment II 

This experiment comprises a data set composed of fifty five images at screen size resolution, 
all of them in the category of nebulae, galaxies, and stars, taken from the top 100 images on 
the HST website [18]. Table 2 displays the average SR for the set, denoted as SR as well as 
the average processing time, t, per image in the set, using 0MP2D, SPMP2Di, SPMP2Dio, 
and MP2D, with partitions of square blocks of sides 8, 16, 24, 32, and 40. The average size of 
the images in the set is 1264 x 1194 pixels. 
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8x8 


12.36 


11.26 


12.46 


13.70 


12.18 
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14.35 


38.11 


14.42 


45.23 


14.13 


34.22 


13.21 


28.52 


24 X 24 


14.94 


113.27 


14.96 


111.44 


14.74 


91.11 


13.59 


70.66 


32 X 32 


15.23 


326.09 


15.22 


237.79 


15.05 


207.47 


13.78 


134.73 


40 X 40 


15.36 


839.47 


15.31 


447.20 


15.17 


397.81 


13.83 


239.10 



Table 2. Average SR (SR) and average processing time (t) per image (in sees) for 
approximating, up to a PSNR of 45 dB, a set of 55 images from the HST website. 
Both quantities are displayed against the block size partitioning the images. The 
average size of the images in the set is 1264 x 1194 pixels. Note: the average times 
per image are also the average of 5 independent runs for each given block size. 

Now we are also interested in testing the proposed mixed dictionary against other possible 
mixed dictionaries. Preliminary experiments have shown that all combination of dictionaries 
containing a RDC (with redundancy two) component perform better than combinations 
without this component. Considering the preliminary tests we compare here the results 
obtained with the dictionaries of Sec. 2.B.1, against other mixed dictionaries containing a 
RDC component. The Euclidean basis is also included in all dictionaries. The comparison is 
carried out with respect to the remaining localized atoms. The RDBS dictionary is replaced 
by another one constructed in an equivalent manner using the prototypes in the top graphs 
of Fig. 5. The atoms of support 2,4,6, and 8, represented in the top right graph of Fig. 5, 
are discretized versions of Haar wavelets. The other prototypes are discrctized versions of 
the continuous wavelets given in [31], the form of which is very similar to the Mexican 
Hat wavelet. Three fractional scaling parameters were used to produce discrete wavelets 
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Fig. 5. The two top graphs are the prototype atoms defining the RDW dictionary. 
The two bottom graphs are prototype atoms of random shape defining a realization 
of the RR dictionary. 



of support 3, 5, and 7, represented in the top left graph of Fig. 5. We call this dictionary 
Redundant Discrete Wavelet (RDW) dictionary. For further comparison we constructed a 
random dictionary from normal distributed random atoms of support equal to the atoms of 
the dictionary we are testing against. We call such a dictionary Redundant Random (RR) 
dictionary. The prototypes corresponding to a particular realization of the random shapes 
are shown in the bottom graphs of Fig. 5. For the experiment, five different realizations of a 
RR dictionary were tested. All the realizations produce similar results. The results displayed 
in Table 3 correspond to the average of the five realizations. Thus, the corresponding sparsity 
ratio SR is a double average. Namely, the average sparsity ratio SR for the set of fifty five 
images and the average of this quantity SR with respect to the five realizations of the RR 
dictionary. As already mentioned the SR does not depend significantly on the dictionary 
realization (the standard deviation of SR with respect to the five random realizations is, 
for all the block sizes, less than 4% of the given SR values). The standard deviation with 
respect to the fifty five images is also the average cfsR corresponding to the five realization of 
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the RR dictionary. Table 3 also shows the SR and iTsr for mixed dictionaries corresponding 
to the components RDBS and RDW and the corresponding standard deviations. All the 
dictionary results in Table 3 are obtained using the 0MP2D method for block sides 8, 16, 24 
and 32. The results from the DCT and DWT are also displayed. 

Discussion of Results 

This experiment confirms statistically the gain in sparsity obtained with the proposed 
dictionaries with respect to the DCT and DWT. It also confirms that the proposed approach 
SPMP2Di is a faster option for the implementation of 0MP2D when the image partition is 
larger than 24. 

It is clear from Table 3 that, while the RDW and RR dictionary components produce 
comparable results, the differences with the RDBS component arc significant, specially for 
the larger partition sizes. However, comparison with the SR yielded by the DCT and DWT 
leads to the conclusion that it is the combination of a RDC dictionary with well localized 
atoms of arbitrary shape which yields a significant improvement in the sparsity of high quality 
approximations of astronomical images. Atoms of particular shape, such as the prototypes 
in the RDBS component improve sparsity even further. 
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14.35 
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12.38 
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4.2 


6.39 
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Table 3. Mean (SR) and variance (o"sr) of the SR obtained with different mixed 
dictionaries, by partitioning the images into blocks of size 8, 16, 24 and 32 and 
applying the 0MP2D approach with RDCT-RDBS, RDCT-RDW and RDCT-RR 
dictionaries. The results from the approximation with the transforms DCT and WT 
are also shown. 

4. Conclusions 

Sparse representation of astronomical images has been considered. A mixed dictionary com- 
posed of a RDC component and a RDBS component was proposed. Using a data set of fifty 
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five astronomical images in the category of nebulae, galaxies, and stars, the dictionary was 
shown to be suitable for sparse representation of that class of images. Prom the experiments 
involving atoms of different shapes one can assert that the combination of a RDC component 
with a component of localized atoms of different support yields the most important gain in 
sparsity, with respect to results from the popular transforms DCT and DWT. Nevertheless, 
the proposed particular shape of the RDBS component represents an advantage over other 
possible atoms of equal support, and yields an impressive sparsity gain over DCT and DWT 
results. 

The fact that the proposed dictionary is suitable for block processing by the selection 
technique 0MP2D makes the resulting approach very effective in terms of processing time. 
For the data set characterized by an average sparsity ratio of 12.6 (with standard deviation 
6.2) the processing time, for partition size 8 x 8, is only 11.26 sees per image of average size 
of 1264 X 1194 pixels. This should be appreciated taking into account that the time refers 
to executing a C++ MEX file in a MATLAB environment, using a small laptop with the 
specification details given in Sec. 3. 

Specially for high resolution images, sparsity may significantly increase with the parti- 
tion size. In order to handle these cases, a greedy strategy taking full advantage of the 
separability property of the proposed dictionaries was considered. The approach has been 
termed SPMP2D, because it allows to orthogonalize the seminal MP technique by self- 
projections. Through the experiments the technique was established as a convenient alterna- 
tive to 0MP2D, when the latter scale badly due to storage demands, or the storage capacity 
is not available. 

Finally we would like to highlight that, even though for small partitions the approach is 
fast for sequential computing, the possibility of its parallel implementation is only a question 
of resource availability. The approach obeys a scaling law by independently processing the 
blocks partitioning the image. Hence, a straightforward multiprocessor implementation would 
reduce the processing time of the sequential implementation by a factor approximatelly equal 
to the number of processors. 

The results presented in this Communication are truly encouraging and we feel confident 
that the approach will benefit applications relying on sparse representation of digital images. 
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